1. Data

Load the student scores for the test:

## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3988 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 5501 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

Show data summary

skim(test_scores)
Data summary
Name test_scores
Number of rows 8993
Number of columns 28
_______________________
Column type frequency:
character 4
numeric 24
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
test_version 0 1 3 4 0 2 0
year 0 1 7 7 0 9 0
class 8993 0 NA NA 0 0 0
AnonID 0 1 9 9 0 8899 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
A1_B1 39 1.00 4.38 1.48 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A2_B0 5500 0.39 4.23 1.73 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A3_B2 211 0.98 3.62 2.24 0 0.00 5.0 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A4_B3 132 0.99 4.00 1.43 0 3.00 5.0 5.00 5 <U+2581><U+2581><U+2582><U+2582><U+2587>
A5_B4 375 0.96 4.20 1.83 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A6_B5 1377 0.85 1.99 1.89 0 0.00 2.0 2.00 5 <U+2587><U+2587><U+2581><U+2581><U+2585>
A7_B6 716 0.92 4.00 1.93 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A8_B0 5500 0.39 3.03 2.44 0 0.00 5.0 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A9_B8 199 0.98 4.37 1.66 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A10_B9 244 0.97 3.62 1.90 0 2.50 5.0 5.00 5 <U+2582><U+2581><U+2583><U+2581><U+2587>
A11_B0 5500 0.39 3.89 1.67 0 2.50 5.0 5.00 5 <U+2581><U+2581><U+2582><U+2581><U+2587>
A12_B12 371 0.96 4.20 1.74 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A13_B13 304 0.97 3.85 1.85 0 2.00 5.0 5.00 5 <U+2582><U+2582><U+2581><U+2581><U+2587>
A14_B14 576 0.94 2.97 1.91 0 2.00 2.0 5.00 5 <U+2583><U+2586><U+2581><U+2581><U+2587>
A15_B15 352 0.96 3.60 2.01 0 2.50 5.0 5.00 5 <U+2582><U+2581><U+2582><U+2581><U+2587>
A16_B16 270 0.97 4.49 1.51 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A17_B17 200 0.98 4.62 1.33 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A18_B18 367 0.96 3.77 2.15 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A19_B19 1044 0.88 3.17 2.41 0 0.00 5.0 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A20_B20 920 0.90 2.37 2.50 0 0.00 0.0 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2587>
Total 0 1.00 69.65 21.55 0 58.50 74.0 85.50 100 <U+2581><U+2581><U+2583><U+2587><U+2587>
A0_B7 4149 0.54 2.26 2.49 0 0.00 0.0 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2586>
A0_B10 4515 0.50 2.77 1.73 0 1.25 2.5 4.38 5 <U+2583><U+2587><U+2582><U+2585><U+2587>
A0_B11 4008 0.55 4.18 1.41 0 3.00 5.0 5.00 5 <U+2581><U+2582><U+2581><U+2581><U+2587>

There are some NAs in the data. These are exclusively in the results from the post version, where Moodle recorded a null response as NA (i.e. cases where students did not give an answer that could be graded).

For this analysis, we replace the NA scores with 0, reflecting a non-correct answer.

test_scores = bind_rows(
  "pre" = test_scores_pre %>%
    replace_names(old_names = test_versions$pre, new_names = test_versions$label),
  "post" = test_scores_post %>%
    mutate(across(matches("^B\\d"), ~ replace_na(., 0))) %>% 
    replace_names(old_names = test_versions$post, new_names = test_versions$label),
  .id = "test_version"
)

Show data summary

skim(test_scores)
Data summary
Name test_scores
Number of rows 8993
Number of columns 28
_______________________
Column type frequency:
character 4
numeric 24
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
test_version 0 1 3 4 0 2 0
year 0 1 7 7 0 9 0
class 8993 0 NA NA 0 0 0
AnonID 0 1 9 9 0 8899 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
A1_B1 0 1.00 4.36 1.50 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A2_B0 5500 0.39 4.23 1.73 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A3_B2 0 1.00 3.53 2.28 0 0.00 5.00 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A4_B3 0 1.00 3.94 1.50 0 3.00 5.00 5.00 5 <U+2581><U+2581><U+2582><U+2582><U+2587>
A5_B4 0 1.00 4.03 1.98 0 5.00 5.00 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A6_B5 0 1.00 1.69 1.89 0 0.00 2.00 2.00 5 <U+2587><U+2586><U+2581><U+2581><U+2583>
A7_B6 0 1.00 3.68 2.14 0 2.50 5.00 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A8_B0 5500 0.39 3.03 2.44 0 0.00 5.00 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A9_B8 0 1.00 4.27 1.77 0 5.00 5.00 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A10_B9 0 1.00 3.52 1.97 0 2.50 5.00 5.00 5 <U+2582><U+2581><U+2583><U+2581><U+2587>
A11_B0 5500 0.39 3.89 1.67 0 2.50 5.00 5.00 5 <U+2581><U+2581><U+2582><U+2581><U+2587>
A12_B12 0 1.00 4.02 1.90 0 5.00 5.00 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A13_B13 0 1.00 3.72 1.95 0 2.00 5.00 5.00 5 <U+2582><U+2582><U+2581><U+2581><U+2587>
A14_B14 0 1.00 2.78 1.99 0 1.00 2.00 5.00 5 <U+2585><U+2586><U+2581><U+2581><U+2587>
A15_B15 0 1.00 3.46 2.09 0 1.25 5.00 5.00 5 <U+2583><U+2581><U+2582><U+2581><U+2587>
A16_B16 0 1.00 4.36 1.67 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A17_B17 0 1.00 4.51 1.48 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A18_B18 0 1.00 3.62 2.24 0 0.00 5.00 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A19_B19 0 1.00 2.81 2.48 0 0.00 5.00 5.00 5 <U+2586><U+2581><U+2581><U+2581><U+2587>
A20_B20 0 1.00 2.13 2.47 0 0.00 0.00 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2586>
Total 0 1.00 69.65 21.55 0 58.50 74.00 85.50 100 <U+2581><U+2581><U+2583><U+2587><U+2587>
A0_B7 3493 0.61 1.99 2.45 0 0.00 0.00 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2585>
A0_B10 3493 0.61 2.25 1.89 0 0.00 1.88 4.38 5 <U+2587><U+2586><U+2582><U+2583><U+2587>
A0_B11 3493 0.61 3.78 1.81 0 2.00 5.00 5.00 5 <U+2582><U+2582><U+2581><U+2581><U+2587>

Data cleaning

  1. For students who took the test more than once, consider the attempt with the highest scores only and remove the others;

  2. Eliminate the students who scored three or more zeros in the 5 easiest questions in the second-half of the test; and

  3. Add the students scoring more than 30 marks in total back to the sample.

# keep a copy
test_scores_unfiltered <- test_scores

test_scores <- test_scores_unfiltered %>% 
  group_by(AnonID) %>% 
  slice_max(Total, n = 1) %>% 
  ungroup() %>% 
  rowwise() %>% 
  mutate(invalid_in_easiest_5 = case_when(
    test_version == "pre" ~ sum(A11_B0==0, A12_B12==0, A13_B13==0, A16_B16==0, A17_B17==0),
    test_version == "post" ~ sum(is.na(A0_B11), is.na(A12_B12), is.na(A16_B16), is.na(A17_B17), is.na(A18_B18))
  )
  ) %>% 
  filter(invalid_in_easiest_5 <= 2 | Total >= 30) %>% 
  select(-invalid_in_easiest_5) %>% 
  ungroup()

# test_scores <- test_scores_unfiltered %>%
#   group_by(AnonID) %>%
#   slice_max(Total, n = 1) %>%
#   ungroup() %>%
#   filter(Total > 0)

bind_rows(
  "unfiltered" = test_scores_unfiltered %>% select(Total),
  "filtered" = test_scores %>% select(Total),
  .id = "dataset"
) %>% 
  mutate(dataset = fct_relevel(dataset, "unfiltered", "filtered")) %>% 
  ggplot(aes(x = Total)) +
  geom_histogram(bins = 25) +
  facet_wrap(vars(dataset)) +
  theme_minimal()

Data summary

The number of responses from each cohort:

test_scores %>% 
  group_by(year) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
year n
2013-14 815
2014-15 923
2015-16 719
2016-17 757
2017-18 909
2018-19 1145
2019-20 1216
2020-21 1171
2021-22 992

There is the same (roughly normal) distribution of raw scores each year:

test_scores %>%
  ggplot(aes(x = Total)) +
  geom_histogram(binwidth = 2) +
  facet_wrap(vars(year)) +
  theme_minimal()

Mean and standard deviation for each item (note this table is very wide - see the scroll bar at the bottom!):

test_scores %>% 
  select(-class, -test_version, -Total) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  select(-contains("complete")) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  #fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
2013-14 2014-15 2015-16 2016-17 2017-18 2018-19 2019-20 2020-21 2021-22
n_missing Mean SD n_missing Mean SD n_missing Mean SD n_missing Mean SD n_missing Mean SD n_missing Mean SD n_missing Mean SD n_missing Mean SD n_missing Mean SD
AnonID 0 NA NA 0 NA NA 0 NA NA 0 NA NA 0 NA NA 0 NA NA 0 NA NA 0 NA NA 0 NA NA
A1_B1 0 4.32 1.41 0 4.40 1.32 0 4.45 1.29 0 4.51 1.16 0 4.44 1.50 0 4.37 1.54 0 4.51 1.40 0 4.49 1.41 0 4.36 1.59
A2_B0 0 4.31 1.65 0 4.41 1.53 0 4.49 1.45 0 4.54 1.37 909 NaN NA 1145 NaN NA 1216 NaN NA 1171 NaN NA 992 NaN NA
A3_B2 0 3.33 2.36 0 3.24 2.39 0 3.39 2.34 0 3.61 2.24 0 3.60 2.24 0 3.66 2.21 0 3.87 2.10 0 3.80 2.14 0 3.83 2.12
A4_B3 0 3.72 1.43 0 3.83 1.33 0 3.91 1.38 0 4.12 1.22 0 4.07 1.44 0 4.06 1.43 0 4.09 1.46 0 4.17 1.40 0 4.21 1.40
A5_B4 0 4.05 1.96 0 4.19 1.84 0 4.28 1.76 0 4.33 1.70 0 3.95 2.04 0 4.05 1.96 0 4.10 1.93 0 4.21 1.82 0 4.05 1.96
A6_B5 0 1.47 1.84 0 1.52 1.85 0 1.81 1.95 0 2.04 1.97 0 1.75 1.82 0 1.66 1.83 0 1.77 1.87 0 1.83 1.91 0 1.85 1.96
A7_B6 0 3.57 2.17 0 3.47 2.22 0 3.69 2.12 0 4.15 1.83 0 3.80 2.08 0 3.81 2.06 0 3.93 2.00 0 3.92 1.99 0 3.69 2.14
A8_B0 0 2.97 2.46 0 3.04 2.44 0 3.32 2.36 0 3.71 2.19 909 NaN NA 1145 NaN NA 1216 NaN NA 1171 NaN NA 992 NaN NA
A9_B8 0 4.31 1.73 0 4.46 1.55 0 4.44 1.57 0 4.66 1.27 0 4.22 1.81 0 4.38 1.64 0 4.39 1.64 0 4.41 1.62 0 4.33 1.70
A10_B9 0 3.19 2.03 0 3.29 2.00 0 3.34 1.97 0 3.62 1.88 0 3.65 1.85 0 3.65 1.89 0 3.78 1.86 0 3.92 1.78 0 3.91 1.79
A11_B0 0 4.05 1.45 0 4.16 1.40 0 4.13 1.42 0 4.26 1.34 909 NaN NA 1145 NaN NA 1216 NaN NA 1171 NaN NA 992 NaN NA
A12_B12 0 3.99 1.90 0 3.93 1.95 0 4.10 1.81 0 4.29 1.65 0 4.16 1.79 0 4.15 1.79 0 4.26 1.71 0 4.25 1.72 0 4.20 1.76
A13_B13 0 3.69 1.95 0 3.68 1.94 0 3.82 1.82 0 4.07 1.71 0 3.72 1.91 0 3.83 1.88 0 3.92 1.83 0 3.96 1.80 0 3.85 1.88
A14_B14 0 2.44 1.93 0 2.54 1.93 0 2.85 1.92 0 3.03 1.90 0 2.85 1.95 0 2.88 1.93 0 3.05 1.94 0 3.02 1.96 0 3.05 2.01
A15_B15 0 3.57 2.00 0 3.51 2.02 0 3.67 1.95 0 3.69 2.02 0 3.54 2.05 0 3.50 2.05 0 3.46 2.10 0 3.73 1.97 0 3.57 2.08
A16_B16 0 4.51 1.49 0 4.49 1.52 0 4.55 1.43 0 4.67 1.24 0 4.45 1.57 0 4.41 1.61 0 4.43 1.59 0 4.47 1.54 0 4.59 1.37
A17_B17 0 4.69 1.21 0 4.59 1.38 0 4.74 1.12 0 4.76 1.08 0 4.65 1.28 0 4.59 1.37 0 4.65 1.28 0 4.64 1.30 0 4.65 1.27
A18_B18 0 3.48 2.30 0 3.45 2.31 0 3.65 2.22 0 3.84 2.11 0 3.72 2.18 0 3.66 2.21 0 3.88 2.08 0 3.89 2.08 0 3.96 2.03
A19_B19 0 2.28 2.49 0 2.26 2.49 0 2.60 2.50 0 3.01 2.45 0 2.89 2.47 0 3.07 2.44 0 3.17 2.41 0 3.25 2.39 0 3.26 2.38
A20_B20 0 1.82 2.41 0 1.81 2.41 0 1.99 2.45 0 2.46 2.50 0 2.21 2.48 0 2.29 2.49 0 2.37 2.50 0 2.25 2.49 0 2.43 2.50
A0_B7 815 NaN NA 923 NaN NA 719 NaN NA 757 NaN NA 0 1.90 2.43 0 1.91 2.43 0 2.02 2.45 0 2.14 2.48 0 2.03 2.46
A0_B10 815 NaN NA 923 NaN NA 719 NaN NA 757 NaN NA 0 2.01 1.84 0 2.07 1.84 0 2.36 1.89 0 2.48 1.91 0 2.38 1.93
A0_B11 815 NaN NA 923 NaN NA 719 NaN NA 757 NaN NA 0 3.77 1.82 0 3.78 1.79 0 3.88 1.75 0 3.85 1.79 0 3.77 1.84

2. Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.

Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.

3. Fitting 2 parameter logistic MIRT model

The mirt implementation of the graded partial credit model (gpmc) requires that the partial marks are consecutive integers. We therefore need to work around this by adjusting our scores into that form (e.g. replacing scores of 0, 2.5, 5 with 1, 2, 3), while keeping track of the true scores attached to each mark level so that we can properly compute expected scores later on.

# save just the matrix of scores
item_scores <- test_scores %>% 
  select(matches("^A\\d"))

# Determine the mark levels for each item
mark_levels <- item_scores %>% 
  pivot_longer(everything(), names_to = "item", values_to = "score") %>% 
  distinct() %>% 
  # we don't want to give a level to NA
  filter(!is.na(score)) %>% 
  arrange(parse_number(item), score) %>% 
  group_by(item) %>%
  mutate(order = row_number()) %>% 
# Note that the convention used by mirt is for items that have only 2 levels (i.e. 0 marks or full marks),
# the columns are P.0 and P.1, while other items are indexed from 1, i.e. P.1, P.2, ...
# https://github.com/philchalmers/mirt/blob/accd2383b9a4d17a4cab269717ce98434900b62c/R/probtrace.R#L57
  mutate(level = case_when(
    max(order) == 2 ~ order - 1,
    TRUE ~ order * 1.0
  )) %>% 
  mutate(levelname = paste0(item, ".P.", level))

# Use the mark_levels table to replace scores with levels
# (first pivot the data to long form, make the replacement, then pivot back to wide again)
item_scores_levelled <- item_scores %>% 
  # temporarily add row identifiers
  mutate(row = row_number()) %>% 
  pivot_longer(cols = -row, names_to = "item", values_to = "score") %>% 
  left_join(mark_levels %>% select(item, score, level), by = c("item", "score")) %>% 
  select(-score) %>% 
  pivot_wider(names_from = "item", values_from = "level") %>% 
  select(-row)

Show model fitting output

irt_fit <- mirt(
  data = item_scores_levelled, # just the columns with question scores
  model = 1,          # number of factors to extract
  itemtype = "gpcm",  # generalised partial credit model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -140201.896, Max-Change: 2.44104
Iteration: 2, Log-Lik: -125840.511, Max-Change: 0.51042
Iteration: 3, Log-Lik: -124316.163, Max-Change: 0.48192
Iteration: 4, Log-Lik: -124054.077, Max-Change: 0.18441
Iteration: 5, Log-Lik: -123928.237, Max-Change: 0.13866
Iteration: 6, Log-Lik: -123873.691, Max-Change: 0.12047
Iteration: 7, Log-Lik: -123840.232, Max-Change: 0.07726
Iteration: 8, Log-Lik: -123829.225, Max-Change: 0.06291
Iteration: 9, Log-Lik: -123822.166, Max-Change: 0.02540
Iteration: 10, Log-Lik: -123817.865, Max-Change: 0.03143
Iteration: 11, Log-Lik: -123815.728, Max-Change: 0.01224
Iteration: 12, Log-Lik: -123814.106, Max-Change: 0.03540
Iteration: 13, Log-Lik: -123812.966, Max-Change: 0.00751
Iteration: 14, Log-Lik: -123812.611, Max-Change: 0.00306
Iteration: 15, Log-Lik: -123812.455, Max-Change: 0.00220
Iteration: 16, Log-Lik: -123812.314, Max-Change: 0.00439
Iteration: 17, Log-Lik: -123812.238, Max-Change: 0.00048
Iteration: 18, Log-Lik: -123812.233, Max-Change: 0.00158
Iteration: 19, Log-Lik: -123812.206, Max-Change: 0.00065
Iteration: 20, Log-Lik: -123812.203, Max-Change: 0.00734
Iteration: 21, Log-Lik: -123812.159, Max-Change: 0.00062
Iteration: 22, Log-Lik: -123812.158, Max-Change: 0.00062
Iteration: 23, Log-Lik: -123812.149, Max-Change: 0.00007
## 
## Calculating information matrix...

Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

irt_residuals  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only one showing cause for concern:

irt_residuals %>%
  rownames_to_column(var = "item1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(item1) < parse_number(item2)) %>%
  gt()
item1 item2 Q3_score
A16_B16 A17_B17 0.2742701

This pair of questions on variations of the chain rule was also identified by the analysis of the first version of the test.

Given that this violation of the local independence assumption is very mild, we proceed using this model.

Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default.

test_scores <- test_scores %>%
  mutate(F1 = fscores(irt_fit, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

irt_coefs <- coef(irt_fit, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. For each question, the object records several values:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we use the tidy_mirt_coefs function that we have provided in common-functions.R, to produce a single data frame with a row for each question.

tidy_mirt_coefs <- function(x){
  x %>%
    # melt the list element
    melt() %>%
    # convert to a tibble
    as_tibble() %>%
    # convert factors to characters
    mutate(across(where(is.factor), as.character)) %>%
    # only focus on rows where X2 is a, or starts with b (the parameters in the GPCM)
    filter(X2 == "a" | str_detect(X2, "^b")) %>%
    # in X1, relabel par (parameter) as est (estimate)
    mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
    # turn into a wider data frame
    pivot_wider(names_from = X1, values_from = value) %>% 
    rename(par = X2)
}

# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_irt_coefs <- map_dfr(head(irt_coefs, -1), tidy_mirt_coefs, .id = "Question")

Here is a nicely formatted table of the result:

tidy_irt_coefs %>% 
  filter(par == "a") %>% 
  select(-par) %>% 
  rename_with(.fn = ~ paste0("a_", .x), .cols = -Question) %>% 
  left_join(
    tidy_irt_coefs %>% 
      filter(str_detect(par, "^b")),
    by = "Question"
  ) %>% 
  gt(groupname_col = "Question") %>%
  fmt_number(columns = contains("est|_"), decimals = 3) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = c("est", "CI_2.5", "CI_97.5"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = c("par", "est", "CI_2.5", "CI_97.5"))
Discrimination Difficulty
a_est a_CI_2.5 a_CI_97.5 par est CI_2.5 CI_97.5
A1_B1
0.5903876 0.5406239 0.6401512 b1 -0.8762878 -1.06364420 -0.688931314
0.5903876 0.5406239 0.6401512 b2 -4.0789207 -4.42802857 -3.729812755
A2_B0
0.3986462 0.3187332 0.4785592 b1 1.7718551 1.02833546 2.515374645
0.3986462 0.3187332 0.4785592 b2 -8.0197432 -9.63636007 -6.403126329
A3_B2
1.3212249 1.2384910 1.4039587 b1 -0.9534731 -1.01282209 -0.894124127
A4_B3
0.2309112 0.2172789 0.2445435 b1 17.9345439 13.26317007 22.605917658
0.2309112 0.2172789 0.2445435 b2 -20.0855489 -24.74154977 -15.429548117
0.2309112 0.2172789 0.2445435 b3 2.3054677 1.36584974 3.245085623
0.2309112 0.2172789 0.2445435 b4 4.4669957 2.83061016 6.103381194
0.2309112 0.2172789 0.2445435 b5 -11.7090503 -13.30473478 -10.113365775
0.2309112 0.2172789 0.2445435 b6 0.7089301 0.08495152 1.332908597
0.2309112 0.2172789 0.2445435 b7 -3.7663865 -4.36218527 -3.170587795
0.2309112 0.2172789 0.2445435 b8 4.9997854 4.24721333 5.752357564
0.2309112 0.2172789 0.2445435 b9 -2.3612201 -3.12636028 -1.596079945
0.2309112 0.2172789 0.2445435 b10 -3.4936624 -4.08792699 -2.899397732
0.2309112 0.2172789 0.2445435 b11 8.8254101 7.78080083 9.870019339
0.2309112 0.2172789 0.2445435 b12 -16.7420306 -18.07028409 -15.413777163
A5_B4
1.1972742 1.1121764 1.2823720 b1 -1.6172652 -1.71262347 -1.521906860
A6_B5
0.4232197 0.3940411 0.4523983 b1 0.4384589 0.31206105 0.564856713
0.4232197 0.3940411 0.4523983 b2 9.7516946 8.83264959 10.670739570
0.4232197 0.3940411 0.4523983 b3 -7.8377526 -8.72431817 -6.951187061
A7_B6
0.7799881 0.7318648 0.8281114 b1 1.4296501 1.22910445 1.630195824
0.7799881 0.7318648 0.8281114 b2 -3.5713558 -3.82588725 -3.316824437
A8_B0
0.5546119 0.4573199 0.6519040 b1 -1.2715514 -1.49873771 -1.044365033
A9_B8
1.5223572 1.4129476 1.6317668 b1 -1.7638733 -1.85529597 -1.672450693
A10_B9
0.7576383 0.7104561 0.8048205 b1 -0.9151491 -1.01208093 -0.818217301
0.7576383 0.7104561 0.8048205 b2 -1.3375611 -1.45069433 -1.224427775
A11_B0
0.3095228 0.2641561 0.3548895 b1 -1.8229800 -2.66148554 -0.984474541
0.3095228 0.2641561 0.3548895 b2 -5.6073282 -6.52019880 -4.694457532
0.3095228 0.2641561 0.3548895 b3 8.8029182 7.09480184 10.511034604
0.3095228 0.2641561 0.3548895 b4 -13.1115926 -15.27758241 -10.945602764
A12_B12
0.5366094 0.5012009 0.5720178 b1 1.5253228 1.22948589 1.821159655
0.5366094 0.5012009 0.5720178 b2 -0.6549970 -0.93565037 -0.374343618
0.5366094 0.5012009 0.5720178 b3 -5.4472146 -5.85548846 -5.038940750
A13_B13
0.5463417 0.5127987 0.5798848 b1 -0.8516585 -1.00188494 -0.701432140
0.5463417 0.5127987 0.5798848 b2 3.6819922 3.27355379 4.090430615
0.5463417 0.5127987 0.5798848 b3 -6.7680450 -7.28854736 -6.247542670
A14_B14
0.5596943 0.5294767 0.5899120 b1 1.4730648 1.24197586 1.704153793
0.5596943 0.5294767 0.5899120 b2 -3.7090203 -3.95840702 -3.459633481
0.5596943 0.5294767 0.5899120 b3 8.4955450 7.62383922 9.367250858
0.5596943 0.5294767 0.5899120 b4 -3.9578263 -4.75849838 -3.157154267
0.5596943 0.5294767 0.5899120 b5 -4.3498411 -4.70942439 -3.990257823
A15_B15
0.2301284 0.2123758 0.2478809 b1 9.6750114 8.61751708 10.732505699
0.2301284 0.2123758 0.2478809 b2 -7.6088352 -8.50562713 -6.712043178
0.2301284 0.2123758 0.2478809 b3 4.1450859 3.51844982 4.771722032
0.2301284 0.2123758 0.2478809 b4 -11.7297758 -12.76409205 -10.695459492
A16_B16
1.6124513 1.4918755 1.7330271 b1 -1.8780793 -1.97442906 -1.781729494
A17_B17
1.8952926 1.7420232 2.0485619 b1 -2.0217898 -2.12129246 -1.922287163
A18_B18
1.1033837 1.0282986 1.1784687 b1 -1.2064065 -1.28444775 -1.128365235
A19_B19
1.6307235 1.5377978 1.7236492 b1 -0.2872475 -0.32643639 -0.248058601
A20_B20
1.8798335 1.7726293 1.9870376 b1 0.2062216 0.17014270 0.242300426
A0_B7
1.0276466 0.9431994 1.1120938 b1 0.5479310 0.47989805 0.615964042
A0_B10
0.3976621 0.3712880 0.4240362 b1 4.9758093 4.40615271 5.545465849
0.3976621 0.3712880 0.4240362 b2 -4.9824921 -5.50816209 -4.456822203
0.3976621 0.3712880 0.4240362 b3 2.2783264 1.93344502 2.623207804
0.3976621 0.3712880 0.4240362 b4 -0.2626685 -0.61774483 0.092407878
0.3976621 0.3712880 0.4240362 b5 1.0437582 0.66896826 1.418548160
0.3976621 0.3712880 0.4240362 b6 -0.3798211 -0.75398380 -0.005658485
0.3976621 0.3712880 0.4240362 b7 1.9002418 1.50804477 2.292438817
0.3976621 0.3712880 0.4240362 b8 -2.6886136 -3.11004432 -2.267182839
A0_B11
0.3926897 0.3645326 0.4208468 b1 2.3029079 1.77404746 2.831768405
0.3926897 0.3645326 0.4208468 b2 -5.3661915 -5.90773541 -4.824647520
0.3926897 0.3645326 0.4208468 b3 9.0795589 7.85582083 10.303296924
0.3926897 0.3645326 0.4208468 b4 -3.5559730 -4.70187071 -2.410075275
0.3926897 0.3645326 0.4208468 b5 -8.8765567 -9.70912993 -8.043983547

These values are also saved to the file output/uoe_pre-vs-post_gpcm-results.csv.

tidy_irt_coefs %>% 
  write_csv("output/uoe_pre-vs-post_gpcm-results.csv")

Information curves

theta <- seq(-6, 6, by=0.05)

info_matrix <- testinfo(irt_fit, theta, individual = TRUE)
colnames(info_matrix) <- test_versions %>% pull(label)
item_info_data <- info_matrix %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>% 
  left_join(test_versions %>% select(item = label, MATH_group), by = "item") %>% 
  mutate(item = fct_reorder(item, parse_number(item)))

Test information curve

item_info_data %>% 
  group_by(theta) %>% 
  summarise(info_y = sum(info_y)) %>% 
  ggplot(aes(x = theta, y = info_y)) +
  geom_line() +
  labs(y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info.pdf", width = 6, height = 4, units = "in")

This shows that the information given by the test is skewed toward the lower end of the ability scale - i.e. it can give more accurate estimates of students’ ability where their ability level is slightly below the mean.

Item information curves

Breaking this down by question helps to highlight those questions that are most/least informative:

item_info_data %>% 
  ggplot(aes(x = theta, y = info_y, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Information") +
  theme_minimal()

We can also compute the sums of different subsets of the information curves – here, we will look at the questions based on their MATH group:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0)),
    items_C = sum(ifelse(MATH_group == "C", info_y, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B", "C")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)

This shows that the information in the MATH Group B questions is at a higher point on the ability scale than for the MATH Group A questions.

Since the number of items in each case is different, we consider instead the mean information per item:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y) / n(),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)) / sum(ifelse(MATH_group == "A", 1, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0)) / sum(ifelse(MATH_group == "B", 1, 0)),
    items_C = sum(ifelse(MATH_group == "C", info_y, 0)) / sum(ifelse(MATH_group == "C", 1, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B", "C")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Mean information per item") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B-avg.pdf", units = "cm", width = 10, height = 6)

This shows that items of each MATH group are giving broadly similar levels of information on average, but at different points on the ability scale.

Total information

Using mirt’s areainfo function, we can find the total area under the information curves.

info_gpcm <- areainfo(irt_fit, c(-4,4))
info_gpcm %>% gt()
LowerBound UpperBound Info TotalInfo Proportion nitems
-4 4 33.70541 36.09974 0.9336745 23

This shows that the total information in all items is 36.0997418.

Information by item

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(irt_fit,
               c(-4, 4),
               which.items = .x) %>% pull(TotalInfo)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post MATH_group description label outcome Information
NA B10 B 2x2 system possibilities A0_B10 added 3.1772358
A14 B14 B find minimum gradient of cubic A14_B14 unchanged 2.7973270
A4 B3 A completing the square A4_B3 unchanged 2.7612246
NA B11 C using max and min functions A0_B11 added 1.9590963
A17 B17 A chain rule A17_B17 unchanged 1.8952920
A20 B20 B product rule with given values A20_B20 unchanged 1.8798335
A13 B13 A equation of tangent A13_B13 unchanged 1.6352955
A19 B19 B area between curve and x-axis (in 2 parts) A19_B19 unchanged 1.6307232
A16 B16 A trig chain rule A16_B16 unchanged 1.6124480
A12 B12 A find point with given slope A12_B12 unchanged 1.6085723
A7 B6 B graphical vector sum A7_B6 unchanged 1.5598502
A9 B8 A simplify logs A9_B8 unchanged 1.5223517
A10 B9 B identify graph of rational functions A10_B9 unchanged 1.5143576
A3 B2 B composition of functions A3_B2 unchanged 1.3212157
A6 B5 A trig wave function A6_B5 unchanged 1.2639778
A11 NA A summing arithmetic progression A11_B0 removed 1.2033969
A5 B4 A trig double angle formula A5_B4 unchanged 1.1972207
A1 B1 A properties of fractions A1_B1 unchanged 1.1777765
A18 B18 A definite integral A18_B18 unchanged 1.1033115
NA B7 B angle between 3d vectors (in context) A0_B7 added 1.0275643
A15 B15 A find and classify stationary points of cubic A15_B15 unchanged 0.9123828
A2 NA A find intersection of lines A2_B0 removed 0.7900902
A8 NA A compute angle between 3d vectors A8_B0 removed 0.5491977

Restricting instead to the range \(-2\leq\theta\leq2\):

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(irt_fit,
               c(-2, 2),
               which.items = .x) %>% pull(Info)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post MATH_group description label outcome Information
NA B10 B 2x2 system possibilities A0_B10 added 2.6894906
A14 B14 B find minimum gradient of cubic A14_B14 unchanged 2.3637938
A20 B20 B product rule with given values A20_B20 unchanged 1.7882056
A4 B3 A completing the square A4_B3 unchanged 1.7315578
A19 B19 B area between curve and x-axis (in 2 parts) A19_B19 unchanged 1.4984133
NA B11 C using max and min functions A0_B11 added 1.3954500
A7 B6 B graphical vector sum A7_B6 unchanged 1.2153004
A13 B13 A equation of tangent A13_B13 unchanged 1.1544884
A12 B12 A find point with given slope A12_B12 unchanged 1.0519507
A3 B2 B composition of functions A3_B2 unchanged 1.0300617
A10 B9 B identify graph of rational functions A10_B9 unchanged 1.0158973
A17 B17 A chain rule A17_B17 unchanged 0.9271541
A9 B8 A simplify logs A9_B8 unchanged 0.8916069
A16 B16 A trig chain rule A16_B16 unchanged 0.8821230
NA B7 B angle between 3d vectors (in context) A0_B7 added 0.7691369
A6 B5 A trig wave function A6_B5 unchanged 0.7506395
A18 B18 A definite integral A18_B18 unchanged 0.7477240
A5 B4 A trig double angle formula A5_B4 unchanged 0.7178985
A15 B15 A find and classify stationary points of cubic A15_B15 unchanged 0.4887409
A1 B1 A properties of fractions A1_B1 unchanged 0.4321422
A11 NA A summing arithmetic progression A11_B0 removed 0.4304909
A8 NA A compute angle between 3d vectors A8_B0 removed 0.2548706
A2 NA A find intersection of lines A2_B0 removed 0.2211928

Evaluating the changes

item_info_data %>% 
  mutate(item = as.character(item)) %>% 
  left_join(test_versions %>% mutate(item = as.character(label)), by = "item") %>% 
  group_by(theta) %>% 
  summarise(
    items_pre = sum(ifelse(outcome %in% c("unchanged", "removed"), info_y, 0)),
    items_post = sum(ifelse(outcome %in% c("unchanged", "added"), info_y, 0)),
    items_added = sum(ifelse(outcome %in% c("added"), info_y, 0)),
    items_removed = sum(ifelse(outcome %in% c("removed"), info_y, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(panel = ifelse(items %in% c("pre", "post"), "overall", "changes") %>% fct_rev()) %>% 
  mutate(items = fct_relevel(items, "removed", "added", "pre", "post")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_brewer(palette = "Paired") +
  facet_wrap(vars(panel), scales = "free_y") +
  labs(x = "Ability", y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)

The changes have increased the total information in the test:

info_old <- areainfo(irt_fit,
                     c(-4, 4),
                     which.items = test_versions %>% filter(outcome %in% c("unchanged", "removed")) %>% pull(item_num))
info_new <- areainfo(irt_fit,
                     c(-4, 4),
                     which.items = test_versions %>% filter(outcome %in% c("unchanged", "added")) %>% pull(item_num))

versions_info <- bind_rows("Version A" = info_old,
                           "Version B" = info_new,
                           .id = "version") %>% 
  select(-LowerBound, -UpperBound, -Info, -Proportion) %>% 
  mutate(mean = TotalInfo / nitems)

versions_info %>% gt() %>%
  cols_label(
    version = "Test version",
    TotalInfo = "Total information",
    nitems = "Number of items",
    mean = "Mean information per item"
  )
Test version Total information Number of items Mean information per item
Version A 29.93585 20 1.496792
Version B 33.55706 20 1.677853

Response curves

Since the gpcm model is more complicated, there is a characteristic curve for each possible score on the question:

trace_data <- probtrace(irt_fit, theta) %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>% 
  left_join(mark_levels %>% select(item, level = levelname, score), by = "level") %>% 
  mutate(score = as.factor(score))

trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  ggplot(aes(x = theta, y = y, colour = score)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Probability of response") +
  theme_minimal()

To get a simplified picture for each question, we compute the expected score at each ability level:

expected_scores <- trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  group_by(item, theta) %>% 
  summarise(expected_score = sum(as.double(as.character(score)) * y), .groups = "drop")

expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Expected score") +
  theme_minimal()

The resulting curves look quite similar to those from the 2PL, allowing for some similar interpretation. For instance, superimposing all the curves shows that there is a spread of difficulties (i.e. thetas where the expected score is 2.5/5) and that some questions are more discriminating than others (i.e. steeper slopes):

plt <- expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item, text = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/uoe_pre-vs-post_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")

Test response curve

total_expected_score <- expected_scores %>% 
  group_by(theta) %>% 
  summarise(
    expected_score_pre = sum(ifelse(!str_detect(item, "A0"), expected_score, 0)),
    expected_score_post = sum(ifelse(!str_detect(item, "B0"), expected_score, 0))
  ) %>% 
  pivot_longer(cols = starts_with("expected_score_"), names_prefix = "expected_score_", names_to = "test_version", values_to = "expected_score")

total_expected_score %>% 
  ggplot(aes(x = theta, y = expected_score, colour = test_version)) +
  geom_line() +
   geom_point(data = total_expected_score %>% filter(theta == 0)) +
   ggrepel::geom_label_repel(data = total_expected_score %>% filter(theta == 0), aes(label = str_glue("{test_version}: {round(expected_score, 1)}")), box.padding = 0.5) +
  scale_colour_viridis_d(option = "plasma", end = 0.7, guide = "none") +
  labs(y = "Expected score") +
  theme_minimal()

4. Factor analysis for the new test only

Factor analysis setup

Here we redo the factor analysis, but using only the data from the new version of the test.

item_scores_B <- test_scores %>% 
  select(-F1) %>% 
  select(-contains("B0")) %>% 
  filter(test_version == "post") %>% 
  # select only the columns with question scores (names like Ax_Bx)
  select(matches("A\\d+_B\\d+"))

The parameters package provides functions that run various checks to see if the data is suitable for factor analysis, and if so, how many factors should be retained.

structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.94).
    • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(190) = 21996.31, p < .001).

Method Agreement Procedure:

The choice of 1 dimensions is supported by 8 (34.78%) methods out of 23 (t, p, Acceleration factor, R2, VSS complexity 1, Velicer’s MAP, TLI, RMSEA).

plot(n)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

summary(n) %>% gt()
n_Factors n_Methods
1 8
2 5
3 1
4 3
5 1
13 2
18 3
#n %>% tibble() %>% gt()

The scree plot shows the eignvalues associated with each factor:

fa.parallel(item_scores_B, fa = "fa")

## Parallel analysis suggests that the number of factors =  6  and the number of components =  NA

Based on this, there is clear support for a 1-factor solution. We also consider the 2-factor solution.

1 Factor

Show factanal output

fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A9_B8  A10_B9 A12_B12 A13_B13 
##    0.89    0.76    0.68    0.80    0.86    0.72    0.76    0.76    0.74    0.71 
## A14_B14 A15_B15 A16_B16 A17_B17 A18_B18 A19_B19 A20_B20   A0_B7  A0_B10  A0_B11 
##    0.60    0.87    0.79    0.77    0.81    0.69    0.70    0.85    0.65    0.70 
## 
## Loadings:
##  [1] 0.57 0.53 0.51 0.54 0.63 0.56 0.54 0.59 0.55 0.33 0.49 0.45 0.37 0.49 0.49
## [16] 0.36 0.46 0.48 0.43 0.39
## 
##                Factor1
## SS loadings       4.89
## Proportion Var    0.24
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 2135.09 on 170 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(test_versions %>% select(question = label, description, MATH_group), by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("MATH"),
    colors = MATH_colours
  )
question factor_loading description MATH_group
A14_B14 0.6321832 find minimum gradient of cubic B
A0_B10 0.5944184 2x2 system possibilities B
A4_B3 0.5680962 completing the square A
A19_B19 0.5610161 area between curve and x-axis (in 2 parts) B
A0_B11 0.5451688 using max and min functions C
A20_B20 0.5433535 product rule with given values B
A13_B13 0.5425096 equation of tangent A
A7_B6 0.5285327 graphical vector sum B
A12_B12 0.5060842 find point with given slope A
A9_B8 0.4919972 simplify logs A
A3_B2 0.4872203 composition of functions B
A10_B9 0.4868995 identify graph of rational functions B
A17_B17 0.4809110 chain rule A
A16_B16 0.4586695 trig chain rule A
A5_B4 0.4472101 trig double angle formula A
A18_B18 0.4341581 definite integral A
A0_B7 0.3860759 angle between 3d vectors (in context) B
A6_B5 0.3749328 trig wave function A
A15_B15 0.3618988 find and classify stationary points of cubic A
A1_B1 0.3314233 properties of fractions A

The questions that load most strongly on this factor are again dominated by the MATH Group B questions.

The new “in context” question, A0_B7, is disappointingly low on the factor loading (and on information, shown above). Perhaps the context is sufficiently routine for these students.

2 Factor

Here we also investigate the 2-factor solution, to see whether these factors are interpretable.

Show factanal output

fitfact2 <- factanal(item_scores_B, factors = 2, rotation = "varimax")
print(fitfact2, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 2, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A9_B8  A10_B9 A12_B12 A13_B13 
##    0.89    0.75    0.68    0.80    0.85    0.71    0.75    0.77    0.74    0.71 
## A14_B14 A15_B15 A16_B16 A17_B17 A18_B18 A19_B19 A20_B20   A0_B7  A0_B10  A0_B11 
##    0.59    0.85    0.60    0.49    0.78    0.68    0.68    0.83    0.57    0.70 
## 
## Loadings:
##         Factor1 Factor2
## A14_B14 0.59           
## A20_B20 0.54           
## A0_B10  0.64           
## A16_B16         0.61   
## A17_B17         0.70   
## A1_B1                  
## A3_B2   0.46           
## A4_B3   0.50           
## A5_B4   0.36           
## A6_B5   0.36           
## A7_B6   0.49           
## A9_B8   0.34    0.37   
## A10_B9  0.39           
## A12_B12 0.37    0.35   
## A13_B13 0.43    0.32   
## A15_B15         0.31   
## A18_B18         0.40   
## A19_B19 0.49           
## A0_B7   0.40           
## A0_B11  0.48           
## 
##                Factor1 Factor2
## SS loadings       3.46    2.12
## Proportion Var    0.17    0.11
## Cumulative Var    0.17    0.28
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 898.6 on 151 degrees of freedom.
## The p-value is 1.28e-106
load2 <- tidy(fitfact2)
load2_plot <- load2 %>%
  rename(question = variable) %>% 
  left_join(test_versions %>% rename(question = label), by = "question") %>%
  ggplot(aes(x = fl1, y = fl2, colour = MATH_group, shape = MATH_group)) +
  geom_point() +
  geom_text_repel(aes(label = question), show.legend = FALSE, alpha = 0.6) +
  labs(
    x = "Factor 1 (of 2)",
    y = "Factor 2 (of 2)"
  ) +
  scale_colour_manual("MATH group", values = MATH_colours) +
  scale_shape_manual(name = "MATH group", values = c(19, 17, 18)) +
  theme_minimal()


load2_plot +
  labs(
    title = "Standardised Loadings",
    subtitle = "Showing the 2-factor model"
  )

ggsave("output/uoe_pre-vs-post_2factor.pdf", units = "cm", width = 12, height = 8, dpi = 300,
       plot = load2_plot)
main_factors <- load2 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

load2 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors, by = "variable") %>%
  left_join(test_versions %>% select(variable = label, description, MATH_group), by = "variable") %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
Main Factor Question fl1 fl2 description MATH_group
fl1 A0_B10 0.645 0.133 2x2 system possibilities B
A14_B14 0.586 0.265 find minimum gradient of cubic B
A20_B20 0.536 0.19 product rule with given values B
A4_B3 0.496 0.276 completing the square A
A7_B6 0.491 0.216 graphical vector sum B
A19_B19 0.488 0.279 area between curve and x-axis (in 2 parts) B
A0_B11 0.475 0.269 using max and min functions C
A3_B2 0.464 0.186 composition of functions B
A13_B13 0.433 0.319 equation of tangent A
A0_B7 0.404 0.096 angle between 3d vectors (in context) B
A10_B9 0.395 0.281 identify graph of rational functions B
A12_B12 0.369 0.35 find point with given slope A
A5_B4 0.361 0.259 trig double angle formula A
A6_B5 0.356 0.142 trig wave function A
A1_B1 0.276 0.181 properties of fractions A
fl2 A17_B17 0.129 0.704 chain rule A
A16_B16 0.154 0.612 trig chain rule A
A18_B18 0.254 0.396 definite integral A
A9_B8 0.341 0.367 simplify logs A
A15_B15 0.223 0.312 find and classify stationary points of cubic A

TODO interpretation

About this report

This report supports the analysis in the following paper:

[citation needed]

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots